Bilingual phrase-to-phrase alignment for arbitrarily-small datasets

نویسنده

  • Kevin Flanagan
چکیده

This paper presents a novel system for sub-sentential alignment of bilingual sentence pairs, however few, using readily-available machine-readable bilingual dictionaries. Performance is evaluated against an existing gold-standard parallel corpus where word alignments are annotated, showing results that are a considerable improvement on a comparable system and on GIZA++ performance for the same corpus. Since naïve application of the system for N languages would require N(N 1) dictionaries, it is also evaluated using a pivot language, where only 2(N 1) dictionaries would be required, with surprisingly similar performance. The system is proposed as an alternative to statistical methods, for use with very small corpora or for ‘on-the-fly’ alignment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Phrase-to-Phrase Alignment Model for Arbitrarily Long Phrase and Large Corpora

Most statistical machine translation (SMT) systems use phrase-to-phrase translations to capture local context information, leading to better lexical choices and more reliable word reordering. Long phrases capture more contexts than short phrases and result in better translation qualities. On the other hand, the increasing amount of bilingual data poses serious problems for storing all possible ...

متن کامل

Phrase Alignment Based on Combination of Multiple Strategies

Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. There is phrase boundary information in parsing trees of sentences. Linguistics knowledge in translation lexicon and semantic lexicon, and statistics results from bilingual corpus can be used to align Chinese wo...

متن کامل

The ISL statistical translation system for spoken language translation

In this paper we describe the components of our statistical machine translation system used for the spoken language translation evaluation campaign. This system is based on phrase-to-phrase translations extracted from a bilingual corpus. A new phrase alignment approaches will be introduced, which finds the target phrase by optimizing the overall word-to-word alignment for the sentence pair unde...

متن کامل

Multiple Linear Regression for Extracting Phrase Translation Pairs

Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. Phrase translation pairs are always extracted from bilingual sentence pairs. In this paper, we extract phrase translation pairs based on word alignment results of Chinese-English bilingual sentence pairs and par...

متن کامل

Statistical Machine Translation Based on Hierarchical Phrase Alignment

This paper describes statistical machine translation improved by applying hierarchical phrase alignment. The hierarchical phrase alignment is a method to align bilingual sentences phrase-by-phrase employing the partial parse results. Based on the hierarchical phrase alignment, a translation model is trained on a chunked corpus by converting hierarchically aligned phrases into a sequence of chun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015